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Abstract. Dual pivot quicksort refers to variants of classical quicksort 
where in the partitioning step two pivots are used to split the input 
into three segments. This can be done in different ways, giving rise to 
different algorithms. Recently, a dual pivot algorithm due to Yaroslavskiy 
received much attention, because it replaced the well-engineered quicksort 
algorithm in Oracle's Java 7 runtime library. Nebel and Wild (ESA 
2012, best paper award) analyzed this algorithm and showed that on 
average it uses 1.9nlnn + O(n) comparisons to sort an input of size n, 
beating standard quicksort, which uses 2nlnn + 0(n). We introduce a 
model that captures all dual pivot algorithms, give a unified analysis, 
and identify a new dual pivot algorithm that minimizes the expected 
number of key comparisons among all possible algorithms. This minimum 
is 1.8n In n + o(n Inn). We identify a dual pivot strategy whose comparison 
number is only 0(n) away from the optimum. If pivots are chosen from a 
sample of size 5, the minimum is 1.623nlnn + o(nlnn). We also include 
remarks about minimizing the expected number of swaps. 



1 Introduction 

Quicksort [6] is a thoroughly analyzed classical sorting algorithm, described in 
standard textbooks |2|7|llj and with implementations in practically all algorithm 
libraries. Following the divide-and-conquer paradigm, on an input consisting of n 
elements to be sorted quicksort uses a pivot element to partition its input elements 
into two parts, those smaller than the pivot and those larger than the pivot, and 
then uses recursion to sort these parts. It is well known that if the input consists 
of n elements with distinct keys in random order and the pivot is picked by 
just choosing an entry then quicksort uses an expected number of 2nlnrt + 0(n) 
comparisons. In 2009, Yaroslavskiy announcecj^] that he had found an improved 
quicksort implementation, the claim being backed by experiments. After extensive 
empirical studies, in 2009 Yaroslavskiy's algorithm became the new standard 
quicksort algorithm in Oracle's Java 7 runtime library. This algorithm employed 
two pivot elements to split the elements. If two pivots p and q with p < q are used, 
the partitioning step partitions the remaining n — 2 elements into 3 parts: those 

1 An archived version of the relevant discussion in a Java newsgroup can be found at 
http : //permalink.gmane . org/gmane . comp . java. openjdk. core-libs . devel/2628. 

Also see [12]. 
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Fig. 1. Result of the partition step in dual pivot quicksort schemes using two 
pivots p 7 q with p < q. All elements smaller than p are moved to the left of p\ all 
elements larger than q are moved to the right of q. All other elements lie between 
p and q. 



smaller than p (small elements), those in between p and q (medium elements), 
and those larger than q (large elements), see Fig. TJ^ Recursion is then applied 



to the three parts. As remarked in [12] , it came as a surprise that two pivots 
should help, since in his thesis [9] Sedgewick had proposed and analyzed a dual 
pivot approach that was inferior to classical quicksort. Later, Hennequin in his 
thesis [5] studied the general approach of using s > 1 pivot elements, but found 
no improvements over standard quicksort. 

In [T2] (best paper award at ESA 2012) Nebel and Wild analyzed a simplified 
version of Yaroslavskiy's algorithm. (For completeness, this algorithm is given as 
Algorithm [7] in Appendix B.2 ) They showed that it makes 1.9nlnn + 0(n) key 



comparisons in expectation, in contrast to the 2n In n + 0(n) of standard quicksort 
and the ||nlnn + 0(n) of Sedgewick's dual pivot algorithm. On the other hand, 
they showed that the number of expected swap operations in Yaroslavskiy's 
algorithm is 0.6rtlnn + O(n), which is much higher than the 0.33nlnn + 0(n) 
expected swap operations in classical quicksort. In this paper, also following 
tradition, we concentrate on the comparison count and on asymptotic results. 

The authors of [12] state that the reason for Yaroslavskiy's algorithm being 
superior were that his "partitioning method is able to take advantage of certain 
asymmetries in the outcomes of key comparisons." They also state the their 
"Algorithm 2 [Sedgewick's dual pivot method] fails to utilize them, even though 
being based on the same abstract algorithmic idea." So the abstract algorithmic 
idea of using two pivots can lead to different algorithms with different behavior. In 
this paper we describe the design space from which all these algorithms originate. 
We fully explain which simple property makes some dual pivot algorithms perform 
better and some perform worse, and identify an optimal member of this design 
space, which will use 1.8n In n + o(n Inn) comparisons on average — even less than 
Yaroslavskiy's method. 

The first observation is that everything depends on the cost of the partitioning 
step. This is not new at all. Actually, in Hennequin's thesis [5] the connection 
between partitioning cost and overall comparisons for quicksort variants with 
more than one pivot is analyzed in detail. The result relevant for us is that 
if two pivots are used and the (expected) partitioning cost for n elements can 
be bounded by a ■ n + 0(1), for a constant a, then the expected number of 



2 For ease of discussion we assume in this theoretical study that all elements have 
different keys. Of course, in implementations equal keys are an important issue that 
requires a lot of care [10] . 
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comparisons for sorting is 

— a ■ n\nn + O(n). (1) 
o 

(In this paper the constant with the leading term is all that interests us. The 
reader should be warned that for real-life n the linear term can have a big 
influence on the expected number of comparisons.) 

The second observation is that the partitioning cost depends on certain details 
of the partitioning procedure. This is in contrast to standard quicksort with 
one pivot where partitioning always takes n — 1 comparisons. In |12j it is shown 
that Yaroslavskiy's partitioning procedure needs jfn + 0(1) comparisons, while 
Sedgewick's needs + 0(1). For understanding what is going on it is helpful 
to forget about concrete implementations with loops in which indices move along 
in arrays, and entries are swapped, and look at partitioning with two pivots 
in a more abstract way: Assume for simplicity the input is a permutation of 
set {1, ...,n}, and pivots p and q with p < q have been chosen. The task is 
to classify the other n — 2 elements into the classes "small" (s = p — 1 many) , 
"medium" (to = q — p — 1 many) , and "large" (I = n — q many) , by comparing 
these elements one after the other with p or q, or both of them if necessary. The 
only choice the algorithm can make is whether to compare with p or q first. In 
Sedgewick's and in Yaroslavskiy's algorithm this decision is made on grounds 
of the position of certain pointers and the state of control. Seen in an abstract 
way, it is simply u p first" or "q first" . Let s q be the number of small elements 
compared with q first, and £ p be the number of large elements compared with p 
first. Then the total number of comparisons is 

n-2 + m + s q + l p . (2) 

On average over all possible positions of the pivots p and q the term n — 2 + to 
will lead to |n + O(l), independent of the algorithm. The only quantity that 
depends on the partition procedure is w = s q + t p , the number of elements that 
are compared with the "wrong" pivot first. Given a partitioning procedure, this 
is a random variable, depending on the random order of the elements. Fix p and 
q, let s = p — 1 and I = n — q, and let w s ,£ be the expectation of w, conditioned 
on these values, averaged over all possible arrangements of the remaining n — 2 
elements. Now, if (in expectation) f p elements are compared with p first and (in 
expectation) f q elements are compared with q first, then we will show that 

w s ,t = fq- s/n + f p -£/n + o(ri), (3) 

by the randomness of the order. The details of the algorithm will determine 
f p and f q and hence w s ^ as well, and thus decide whether the partitioning 
algorithm and hence the whole sorting procedure uses fewer or more comparisons. 
This seemingly simple insight is the first main message of this paper. It has two 
consequences, one for the analysis and one for the design of dual-pivot algorithms: 

(i) In order to analyze a dual pivot algorithm, given by its partition procedure, 
find out what f p and f q are for this algorithm. This will give w s< £, which must 
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then be averaged over all p, q to find the expected number of comparisons. 
Then apply 0. 

(ii) In order to design a good partition procedure try to make f q • s/n + f p • £/n 
small. 

We shall demonstrate approach (i) in Section |4j An example: As explained in 
[12j . in Yaroslavskiy's algorithm we have f q w £ and f p sa s + m. Thus for given 
p and q we have w s .i — {Is + (s + m)£)/n + o(n). This must be averaged over all 
possible values of p and q (to yield \n + o(n), which together with |n + 0(1) 
gives y|n + o{ri), close to the result from [H]). 

Principle (ii) will be used to identify optimal partition procedures, beating 
Yaroslavskiy's method. In brief, such a strategy should achieve the following: If 
s > £, compare (almost) all entries with p first (f p s» n and f q ps 0), otherwise 
compare (almost) all entries with g first (f p « and f q ~n). Of course, some 
details have to be worked out: How can the algorithm decide which case applies? 
In which technical sense is this strategy optimal? This will be done in Sect. [5] 

Contribution of this paper. We analyze known approaches to dual pivot quicksort 
in a unified way. The analysis of all these methods boils down to finding out 
which fractions of the elements are compared with the smaller resp. larger pivot 
first. We show that the following strategy is optimal: If there are more small 
elements than large elements, then always compare with the smaller pivot first. 
Otherwise, always compare with the larger pivot first. In implementing this, we 
utilize the well-known strategy of random sampling, which can be incorporated 
directly into the partitioning step to obtain a quite natural optimal partitioning 
procedure. For the analysis, we use suitable (martingale) tail bounds. Using 
different methods, in Section [3] we exhibit and analyze an algorithm whose 
expected number of comparisons is only 0(n) away from the absolute minimum. 
When implementing dual pivot quicksort, it is beneficial to choose the pivot 
elements as the second- and fourth-largest in a sample of size 5. We analyze 
this in Sect. [6| For the question of how to decrease the number of swaps, it is 
interesting to reconsider algorithms for the Dutch national flag problem [lj that 
have been used for quicksort with equal keys [10] , This is done in Section [Tj 

Results. The theoretical results of our paper are summarized in Table [l] The new 
partitioning method achieves the lowest expected number of key comparisons up 
to an additive error of o(nlnn). When pivots are chosen from a small sample, 
our method significantly decreases the leading coefficient when compared to the 
well-understood median of three quicksort. This is in contrast to Yaroslavskiy's 
algorithm, which decreases this coefficent only by 0.01. 

To reduce the expected number of swap operation, dual pivot quicksort 
algorithms can build upon the work that has been done in the context of the 
Dutch national flag problem (see, e.g., PQ). Using previous work on this problem, 
we show that one can in fact decrease the expected swap count drastically from 
0.6nlnn + 0(n) (Yaroslavskiy's algorithm) to 0.33..nlnn + 0(n). 

As noted by Wild et al. |I3j . considering only key comparisons and swap 
operations does not suffice to evaluate the practicability of sorting algorithms. In 
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Method 


E(C») 


Method 


E(C„) 


QS 


w 2. On In n + 0(n) 


CQS (1/3) 


« 1. 714n In n + 0(n) 


Yaro 


w 1.9nlnn + O(n) 


Yaro (2/5) 


« 1. 704n In n + 0(n) 


Optimal 


w 1.8n In n + o(n In n) 


Optimal (2/5) 


w 1.623n Inn + o(n Inn) 



Table 1. Main results of our paper. On the left, we show the expected comparison 
count E(C„) of the methods considered in this paper. (QS: Quicksort, Yaro: 
Yaroslavskiy's algorithm, Optimal: Our proposed method.) On the right, we 
show variants of these algorithms that choose the pivots from a small sample 
of the array. Clever quicksort uses the median of three elements; Yaroslavskiy's 
algorithm and our approach use the second- and fourth-largest elements from a 
sample of five elements. The information-theoretic lower bound for comparison 
based sorting is 1.44269.. • nlnn + 0(n). 

Section [8] we will see the following experimental results: When sorting integers, 
our method is slower than Yaroslavskiy's algorithm. When sorting strings (and 
key comparisons become more expensive), we gain a small advantage. 

2 Average Case Analysis of Dual Pivot Quicksort 

We assume the input sequence (ai,...,a„) to be a random permutation of 
{1, . . . , n}, each permutation occurring with probability (1/n!). If n < 1, there 
is nothing to do; if n — 2, we sort by one comparison. Otherwise, choose the 
first element a\ and the last element a n as pivots, and set p = min(ai,a„) 
and q = max(ai,a n ). Clearly, each pair p, q with 1 < p < q < n appears as 
pivots with probability 1/ • We count the number of key comparisons needed 
to sort the given input. Let C n be the random variable counting this number 
for a random input. Let P n denote the partitioning cost to partition the input 
sequence into small, medium, and large elements. As explained by Wild and 
Nebel [TH Appendix A], the expected number of key comparisons obeys the 
following recurrence: 

2 "~ 2 

E(C„) = E(P„) + ■ 3 VVn - k - 1) • E(C fc ). 

n(n-l) ^ 

If E(P„) = a ■ n + 0(1), for a constant a, this can be solved (c/. |5ll2j ) to give 

E(C n ) = \a-nhxn + 0{n). (4) 

In the following, we will analyze different strategies that solve the following 
(simplified) problem: Given a random permutation (oi, . . . , a n ) of {1, . . . , n} as 
the input sequence and a% and a n as the two pivots p and g, classify each of 
the remaining n — 2 elements as being small (i.e., smaller than p), medium (i.e., 
larger than p but smaller than q), or large (i.e., larger than q). We are interested 
in the number of key comparisons needed to classify these elements. Note that in 
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our setting there are exactly s :— p — 1 small elements, m := q — p — 1 medium 
elements, and I := n — q large elements. Although this classification does not 
yield an actual partition of the input sequence, for all approaches considered in 
this paper it is possible to generalize the classification algorithm to a partitioning 
algorithm using only swap operations but no additional key comparisons. 

We fix the following assumptions being valid for all classification algorithms. 
Classifying each of the remaining n — 2 elements w.r.t. to the two pivots p and 
q needs 1 key comparison to check if ai < a n . Afterwards, each of the n — 2 
elements has to be compared at least once against p or q. Additionally, each 
medium element has to compared to p and q. We expect (n — 2)/3 medium 
elements. For given p and q, let s q denote the number of small elements compared 
to q first (the number of small elements that need 2 comparisons for classification), 
and let £ p denote the number of large elements compared to p first (the number 
of large elements that need 2 comparisons for classification) . Using conditional 
expectation on the pivot choices, we calculate E(P n ) as follows: 

E(P„) = (»-l) + (n-2)/3+^r £ (E(s q )+E(e p )). (5) 

\2) l<p<q<n 

We call the third summand the additional cost term, as it is the only value that 
depends on the actual classification algorithm. 

3 Analyzing the Additional Cost Term 

To analyze the additional cost term, we will use the following formalization: A 
classification algorithm (or strategy) is a three-way decision tree T with a root 
and n — 2 levels of inner nodes as well as one leaf level. The root of this decision 
tree is on level 0. Each node v is labeled with an index i <E {2, . . . , n — 1} and 
an element l(v) e {a, A}. If l(v) is a, then ai is compared with the smaller pivot 
first; otherwise, i.e., l(v) — A, it is compared with the larger pivot first. On each 
of the 3™ -2 paths each index occurs exactly once. The three edges out of a node 
are labeled c, fi, A, resp., representing the outcome of the classification as small, 
medium, large, respectively. For an edge e its label is given by /'(e). For each 
input there is exactly one associated path w from the root of the tree to a leaf; 
the classification of the elements can then be read off from the node and edge 
labels. 

Identifying a path w from the root to a leaf v by the sequence of nodes and 
edges (vt, e\, t>2, e2, . . . , v n _ 2 , e n -2, v) on it, we define the cost c w as 

c w = \{je{l,...,n-2} | l'( ej ) ± n,l( Vj ) ? l'( ej )}\. 

For a given input, the cost of the path associated with this input exactly describes 
the number s q + £ p of additional comparisons on this input. An example for such 
a decision tree is given in Figure [2] 

For a random input, let p v be the probability that node v is reached. The 
probability that on the i-th level the algorithm classifies an element as being 
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Fig. 2. An example for a decision tree to classify three elements ai,a,2, and 03. 
Five out of the 27 leaves are explicitly drawn, showing the classification of the 
elements and the costs Cj of the specific paths. 



small, medium, or large, resp., depends only on the number of small, medium, 
and large elements, resp., classified so far, but not on which of the n — 2 — i still 
unclassified elements is chosen. This follows from the randomness of the input 
sequence. So w.l.o.g. all nodes on level i are labeled with the index i + 2 — and 
we can forget these names. 



3.1 Two Optimal Strategies 

Having fixed this notation, we can look for an optimal strategy, i.e., a strategy 
that minimizes the additional cost term. We first study the (unrealistic!) setting 
where s and i.e., the number of small resp. large elements, are known to the 
algorithm after the pivots are chosen, and the algorithm can use a different 
decision tree for each such pair of values. For this, we say that Si and resp., 
are the number of small and large elements, resp., that have been classified in 
the (i — l)-st steps. We study the following strategy O: Given s and t, compare 
the i-th element to p first if s — s,; >£ — £{, otherwise, compare it with q first^\ 

Theorem 1. Strategy O is optimal. 

Proof. Fix p and q (and thus s, m, and £) and let an arbitrary decision tree T 
be given. For a node v in T, we let Y v — if l(v) — cr, and Y v = 1 if l(v) = A. 

3 This strategy was suggested by Thomas Hotz via personal communication. 
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For a node v at level i, we let s v , m v , and £ v , resp., denote the number of edges 
labeled cx, /i, and A, resp., from the root to v. For a random input, the probability 
that the element classified in this node is small is exactly (s — s v )/{n — i — 2). 
Furthermore, the probability that it is medium is (m — m v )/(n — i — 2), and that 
it is large is (£ — £ v )/(n — i — 2). 

The probability that on a random input the edge labeled a of a node v on 
level i is reached by the algorithm is then p v ■ (s — s v )/(n — i — 2). Conversely, 
the probability that the edge labeled A is reached is p v ■ (£ — i v )/(n — i — 2). 

We calculate: 

E(s, + *p) = J> 

•uST 

(*) 

For each node t> e T, strategy labels it in a way that minimizes (*), and hence 
it minimizes the additional cost term. □ 

While being optimal w.r.t. minimizing the additional cost term, this strategy 
assumes that the exact number of small and large elements is known, which of 
course is never true in a real algorithm. Next, we will show that a single decision 
tree shared among all choices of p and q can be optimal up to a very small error 
term. 

We study the following strategy %: Given p and q, compare the i-th element 
to p first if Si > li, otherwise, compare it with q first. 

It is not hard to see that for a given input the number of additional comparisons 
of strategy O and % can differ significantly. The next theorem shows that averaged 
over all possible inputs, however, there exists only a small difference. 

Theorem 2. Let E(P®) be the expected number of key comparisons needed to 
classify an input ofn elements using strategy O. ThenFi(P^) = E(P®)+0(logn). 

The proof of this theorem uses the following key insight: Assume that strategy O 
inspects the elements in the order a n , . . . , ai, while % uses the order a\, . . . , a n . 
If the strategies compare the element to different pivots, then s i+1 = ii+i, 
i.e., there are exactly as many small elements as there are large elements in 
Ox, • • • , Obi- To analyze the difference in the value of the additional cost term, it 
hence suffices to calculate the expected number of positions i e {1, . . . ,n}, where 
Si + i — A rather standard but lengthy calculation shows that this number 
is O(logn). The details of the proof are given in Appendix |A} In that section, 
we also show that an additive error term of O(logn) does not increase the total 
expected number of key comparisons when we use such a strategy to implement 
a dual pivot quicksort algorithm. 

While both of these strategies are highly adaptive and decide for each element 
anew which of the pivots is used for the first comparison, this is by no means a 
requirement for a strategy to be optimal. 

Section [5] will introduce a strategy that either compares all elements to p first 
or compares all elements to q first, based on the result of a small sampling phase. 
This strategy will be optimal up to lower order terms as well. 



Y, 



v n — 2 — level(w) 



:i-y, 



n — 2 — level(w) 
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Before we can get there, we have to focus on the value of the additional cost 
term for an arbitrary comparison tree, i.e., an arbitrary classification algorithm. 



3.2 Calculating the Value of the Additional Cost Term From a 
Comparison Tree 

Now we describe the connection between the comparison tree T and the additional 
cost term in general. Let f p resp. f q denote the number of comparisons with p 
resp. q first. We can calculate these to values as 



f P = E Pv 

veT,l(v)=a 



fq = E P v 

veT,i{v)=\ 



The following theorem shows that these two parameters fully specify the additional 
cost term up to lower order terms. 

Theorem 3. Let p and q be the two pivots. Let T be a decision tree, and let f p 
resp. f q be as defined above. Then 

E(s 9 + £ p ) = /, • s/{n - 2) + f p ■ £/(n - 2) + o(n). 

Proof. Fix p and q (and thus s, m, and £). For a node v € T, let p v denote the 
probability that node v is reached by the algorithm on a random input. 
For a given node v on level i, we call v good if l(v) = A and 



n — 2 n — i — 2 



or l(v) = a and 



n — 2 n — i — 2 



= o(l), 



o(l). 



Otherwise, v is called bad. 

For a random input, we calculate: 



E(s g + £ p ) = Y Pv 

vET,l(v)=\ 



n — 2 — level(w) 



vGT,l(v)=a 



£ £ v 



n — 2 — level(u) 



< 



E pv(^ + o ^)+ E pv+ 

v£T,l(v)=\ V ; v£T,l(v)=\ 

v good v bad 

E Pv ( 



v£T,l(v)=cr 
v good 



n- 2 



E p» 



vET,l(v) = 
v bad 



n- 2 



/g + ^^'/ P + o(n)+ P^ 



(6) 



»eT,u bad 
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where the first and second summand follow by the definition of f p and f q . For the 
third summand, consider each of the levels of the decision tree separately. Since 
the probabilities p v for nodes v on the same level sum up to 1, the contribution 
of the o(l) terms is bounded by o{n). Now, it remains to show that the last 
summand is o(n). 

To see this, consider a random input that is classified by this decision tree. 
We will show that with very high probability we do not reach a bad node in the 
decision tree for the first n — n 3 / 4 levels. Let Xj be the 0-1 random variable that is 
1 if the j-th classified element is small; let Yj be the 0-1 random variable that is 1 
if the j-th classified element is large. Let s, = X\ + . . . + JQ and t- L = Y\ + . . . + Yj, . 
We will use the method of averaged bounded differences to show that these 
random variables are tightly concentrated around their expectation. 

Claim. Let 1 < i < n — 2. Then 

Pr(|s, - E(si)\ > n 2/3 ) < 2cxp(-n 1/3 ), and 
Pr(|£, - E(£)| > n 2/3 ) < 2exp(-n 1/3 ). 

Proof. We focus on proving the first inequality. The second inequality follows 
analogously. 

First, we calculate the difference Cj between the expectation of Si conditioned 
on X\ , . . . , Xj resp. Xi,...,Xj-i. 
We have 

Cj = \E(s i \X 1 ,...,X j )-E(s i \X 1 ,...,X j _ 1 )\ 
S - S 7 -_! - Xj s — s 3 -_i 



X 



fe=i+l 



n-j-2 



E 



S - Sj-! 



n-j-1 



+ 1) 



s - s,-_i 



< 



< 



n — j — 2 n — j — 2 '' n — j — 1 

Xi _ ii-j)Xj _ {s _ s ._ i) »-;-■> 



n-j-2 



(n - j - 2) • (n - j - 1) 



0-1-1 



We use the following bound known as the method of averaged bounded differences 
(see 01 Theorem 5.3]): 



Pr(|si - E(sj)| > t) < 2exp 



2t 2 



V r 2 ; ' 



and get 

Pr(|si-E( Si )| >™ 2/3 ) <2exp^- 
which is not larger than 2exp(— n 1 / 3 ). 



,4/3 



2i 



□ 
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Now, consider the term s — Sj. With high probability we have 
8-m = (n-i-2) (-) +(9(n 2 / 3 ) 

and get 

g - Si = s / n 2 / 3 \ 
n — i — 2 n \n — i — 2 J 

The second summand is o(l) for i < n — n 3 / 4 . (We get an analogous result for 
(I — ti)/{n — i — 2).) So, with high probability we reach a good node on level i, 
for i < n — ri 3 / 4 . Thus, the contribution of the sums of the probabilities of bad 
nodes on a fixed level is o(l) and this contributes o(n) to the latter summand in 
([6| for the first n — n 3 / 4 levels. For the last n 3 / 4 levels of the tree, we use that 
the contribution of the probabilities that we reach a bad node on level i is at 
most 1 for a fixed level. 

This shows that the latter summand in ^ is o(n), and proves the theorem. 

□ 

This means that for the analysis of the expected comparison count of a dual 
pivot quicksort algorithm we just have to find out what f p and f q are for this 
algorithm. Moreover, to design a good algorithm (w.r.t. the expected comparison 
count), we should try to make f q ■ s/{n — 2) + f p ■ £/(n — 2) small. 

The theorem has two technical implications. First, using Theorem [3]in (|J), 
we obtain by a standard calculation 

V2/ l<p<g< ti \ / 

where we call the o(n) term the error term. We will now influence the contribution 
of the error term to the expected number of key comparisons. 

As is well known from the traditional analysis of quicksort, the recursion 
tree for a random input has depth O(logn) with high probability, e.g., with 
probability 1 — 1/n 2 . On each level, the sizes n[ + . . . + n' k of the inputs on this 
level sum up to at most n, and for the contributions of the error terms we get 
o(n[) + . . . + o(n' k ) = o(n). Conditioning on the event that the recursion tree has 
depth O(logn), the contribution of the error terms to the expected number of 
key comparisons is thus o(nlnn). 

Second, considering the actual problem of sorting an input using a dual 
pivot quicksort algorithm, we remark that the tail bounds used in the proof of 
Theorem [3] do not give low enough error probabilities if the subarrays appearing 
in the recursion are too small. We now consider the expected comparison costs 
of sorting subarrays of small size separately. 

Let no = n 1 ^ loglog ™. Let m, . . . , rife be the sizes of the subarrays that have size 
at most no in the recursion. Because each element appears in at most one such 
subarray, we have n\ + . . . + < n. In the worst-case, partitioning a subarray 
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of size rii costs 2n^ key comparisons, which — using Q — results in an expected 
comparison count of at most 2.4n^ \Tm i +0{n i ) to sort this subarray. The expected 
number of comparisons to sort all the subarrays is hence 0(n In no) = o(nlnn). 
So, sorting these small subarrays contributes an additive summand of o(nlnn) 
to the expected number of key comparisons. 

These two observations say that for E(P„) = a ■ n + o(n) we have E(C„) = 
6/5 • a ■ nlnn + o(nlnn) when utilizing Theorem pi 



4 Analysis of Some Partitioning Methods 

In this section, we will study three different partitioning methods in the light of 
the formulas from Section [2j First, we will analyze one of the simplest possible 
ideas to classify each element: Always use p for the first comparison. Second, we 
will consider a strategy based on Yaroslavskiy's algorithm. Third, we will study a 
strategy presented in Sedgewick's dissertation [9] and a small modification of it. 



4.1 Always Compare to p First 



We consider strategy V: For given p and q, always compare with p first. This 
strategy makes sure that f p = n — 2 and f q = 0. Plugging this into |7]) gives ufQ 

E(P^) = |n+ 7 L t + o{n) = \n+o{n). 

'* \2) s +e<n-2 '' 

Using Q we get E(C^) = 2nlnn + o(n Inn) — the same as in standard quicksort. 



4.2 Analysis of Yaroslavskiy's Algorithm 

Following [T^l Section 3.2], Yaroslavskiy's algorithm is an implementation of 
the following strategy 3^ For given p and q, compare I elements to q first, and 
compare the other elements to p first. 

We get that f q =£ and f p — s + m. Applying we calculate 

_._ v , 4 1 / si {s + m)£\ . , 19 . . 

E(py) = -n +m £ ^— + '— f ] +0 (n) = -n + o( n). 

12/ s +l<n-2 V 7 

Using Q gives E(C^) = 1.9n In n + o(n Inn), as in [T2"] . 

4 We omit detailed step-by-step calculations. All calculations can be checked using a 
standard computer algebra system. 
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4.3 Analysis of Sedgewick's Algorithm 

Following [T^J Section 3.2], Sedgewick's algorithm amounts to an implementation 
of the following strategy S: For given p and q, compare (in expectation) a fraction 
of s/(s + £) of the keys with q first, and compare the other keys with p first. We 
get fg = (n — 2) • s/ (s + £) and f p — (n — 2) • £/(s + £). Plugging these values 
into ([7]), we calculate 

/ ^ 4 1 / s 2 £ 2 \ , s 16 

12/ s +^< n -2 v 7 

Applying Q gives E(C,f ) = 2.133... • nlnn + o(nlnn), as known from |12) . 

Obviously, this is worse than strategy V considered before. This is easily 
explained intuitively: If the fraction of small elements is large, it will compare 
many elements with q first. But this costs two comparisons for each small element. 
Conversely, if the fraction of large elements is large, it will compare many elements 
to p first, which is again the wrong decision. 

Since Sedgewick's strategy seems to do exactly the opposite of what one 
should do to lower the comparison count we consider the following modified 
strategy S' : For given p and q, compare (in expectation) a fraction of s/(s + £) 
of the keys with p first, and compare the other keys with q firstrj 

Using the same analysis as above, we get E(P^ ) = + o(n) which yields 
E(Cf' ) = 1.866... • n\nn + o(n\nn) — improving on the standard algorithm and 
even on Yaroslavskiy's algorithm! 



Remark. Exchanging f p and f q as in the strategy described above is a general 
technique. In fact, if the leading coefficient of the expected number of comparisons 
for a fixed choice of f p and f q is a, e.g., a = 2.133... for strategy S, then 
the leading coefficient of the strategy that exchanges f p and f q is 4 — a, e.g., 
4 - 2.133... = 1.866... as in strategy S'. 

To make this precise, let A be a strategy that uses a fixed choice of f p and f q . 
Let A' be a strategy that uses f' p — f q and f — f p , i.e., exchanges the expected 
number of comparisons to p resp. q first. Since f p — (n — 2 — f q ), summing up to 
the additional cost terms of A and A 1 in Q gives us 

(n)\ ^ Vn-2 n-2/ ^ In -2 ri-2 

\2) x l<p< q <n v 7 l<P<q<n V 

= 7^ E s + l=\{n-2). 

V2/ \< p < q <n 

So, if the additional cost term of A is bounded by b ■ n + 0(1), for a constant b, 
then the additional cost term of A' is bounded by (2/3 — 6) ■ n + O(l). Now let 



We remark that in his thesis Sedgewick 9 focused on the expecte d num ber of swaps 



Impiementations of strategy S and 5' can be found in Appendix B.l 
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a = 6/5 ■ (4/3 + b), i.e., E(C^) = a ■ n\nn + o(nlnn). Using Q, we obtain by a 
standard calculation 

E(Ctf') = jj •(! + §-&) + °(»liin) 

= (4 — a)nlnn + o(nlnn), 

which precisely describes the influence of exchanging f p and f q to the value of 
the expected comparison count. 

5 An Optimal Partitioning Method 

Looking at the previous sections, all methods used the idea that we should 
compare a certain fraction of the elements to p first, and all other elements to q 
first. On the other hand, the optimal methods presented in Section [3] were highly 
adaptive — counting the number of small and large elements seen so far and using 
this to decide which pivot is chosen for the first comparison for the next element. 
In this section, we will show that the following strategy V is optimal: If s > I 
then always compare with p first, otherwise, always compare with q first. 

Of course, for an implementation of this strategy we have to deal with the 
problem of finding out which case applies before all comparisons have been made. 
We consider a solution to this problem at the end of this section. 

5.1 Analysis and Optimality of the Ideal Classification Strategy 

Assume for a moment that for a given random input with pivots p, q, the strategy 
"magically" knows if s > I and correctly determines the pivot that should be used 
for all comparisons. For the expected number of key comparisons one obtains by 
a standard calculation: 

E(P^) = U+-L E nun(«,*)+°(n) = |n + (n). (8) 

\2) s+l<n-2 

Applying Q, the total expected number of key comparisons is E(Cjf) = 1.8nlnn+ 
o(nlnn), which is by 0. In Inn smaller than the expected number of key compar- 
isons in Yaroslavskiy's algorithm. 

To see that this method is optimal, recall that the additional cost term is the 
only value in ^ that can be influenced by the algorithm. By Theorem [3j this 
term is fully determined by the parameters f q and f p up to lower order terms. 
Since f p = n — 2 — f q , and 

E{s q + tp) = f q ■ s/(n - 2) + (n - 2 - f q ) ■ £/(n - 2) + o(n) 

is linear in f q , it attains its minimum either at f q = or f q — n — 2. But this is 
just what T> chooses. So, T> is optimal up to lower order terms w.r.t. minimizing 
the expected number of key comparisons. 
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5.2 Guessing Whether s > £ or not 



We explain how the ideal classification algorithm can be approximated by an 
implementation. The idea simply is to make a few comparisons and use the 
outcome as a basis for a guess. 

After p and q are chosen, we classify samplesize many elements and calculate 
d = s' — £' , where s' and I' , resp., is the number of small and large elements, 
resp., in the sample. If d < 0, we guess that s < £ — so, we classify by comparing 
with q first; otherwise, we compare with p first. We say that we guess correctly, 
if the value of d correctly reflects if s < L 

We incorporate guessing errors into ^ as follows: 

E(P„) — ^-n + o(n) + ( Pr (guess correct) • min(s, £)+ 

Pr(guess wrong) • max(s, £) 

= -n + o(n) + -j-r I Pr(guess correct) • s+ 

\2> s= oe= s +i V 

Pr(guess wrong) • £^ . (9) 

The following lemma says that for a wide range of values s and £, the probability 
of a guessing error is exponentially small. 

Lemma 1. Let s and £ with s < £ - n 3 / 4 and I > n 3 / 4 for neN be given. Let 
samplesize = n 2 / 3 . Then Pr(d > 0) = exp (-2n 1 / 6 /9) . 



Proof. Let n' — n 2 / 3 . Let be a random variable that is —1 if the z-th classified 
element is a large element, if it is a medium element, and 1 if it is a small 
element. Let d = Y^?=i ^i- 

Using the assumptions on the values of s and £, straightforward calculations 
show that E(d) < — n'/n 1 / 4 = — n 5 / 12 . Furthermore, we have that 

tH = \E{d\X 1 ,...,X i )-E(d\X l ,...,X i _ 1 )\ < 3,i g {l,...,n'}. 

To see this, we let Sj resp. denote the number of small resp. medium elements 
that are still present if X\, . . . , are classified. Furthermore, let Yj, be the 0-1 
random variable that is 1 if Xi is 1, and let Zi be the 0-1 random variable that 
is 1 if Xi is — 1 . 
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We calculate: 
\E(d\X 1 ,...,X i )-E(d\X i ,...,X i _ 1 )\ 



E*i + E 

3=1 3=i+l 
i—1 n' 

-E^-E 



^+ E 

j=i+l 



PT(Xj = l\X 1 ,...,X i )- Pv(Xj =-l\X u ...,Xi) 



Pr(X j = l\X 1 ,...,X i _ 1 )-Pr(X j =-l\X 1 ,...,X i ^ 



x l+ y: 



Si £i 


n 

-E 

-1 — Zj 


Si-1 4-1 


4-1 


n — i n — i 
'si-x-Yi I, 


n — i + 

n' 

-E 


1 n — i + 1 


n — i n — i 


n — i + 1 n — i + 1 



Xi- 



x,- 



(n> - i) ■ {Zi - Yi) 



< 





n 


— i 




(n' 


-<)■ 


(Zi 


-Yi) 




n 


— i 




(ri 


"«)■ 


(Zi 


-Yi) 




n 


— i 




(Zi 


-Yi) 


+ 1 


< 3 



n' — i n' — i + 1 
n — i n — i + 1 



+ 4-1 



(n — — i + 1) 

+ (4-1 - Si_l) 



n' — i + I n' — i 
n — i + 1 n — i 



(n — i)(n — i + 1) 



n — n 
(n — i)(n — i + 1) 



Using the method of averaged bounded differences (see [H Theorem 5.3]) 

t 2i 2 \ 



we get 



Pr(d > E(d) + t) < exp 



Pr(d > 0) < exp 



y <■ , 



2n 1 / 6 
9 



□ 



Of course, we get an analogous result for s > n 3 / 4 and I < s — n 3 / 4 . 

Our classification algorithm will now work as follows. It starts by sampling 
samplesize = n 2 / 3 many elements and then decides which pivot is used for the 
first comparison. Then it inspects each element according to this approach. 

We can now analyze the expected number of key comparisons of this algorithm. 

Theorem 4. Let samplesize = n 2 / 3 . Then the expected comparison count of 
the algorithm described above is 1.8n In n + o(n Inn). 
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Proof. First, sample samplesize many elements. The number of key comparisons 
to calculate d is at most 2n 2 / 3 = o{n). By symmetry, we may focus on the case 
that s < £. We distinguish the following three cases: 

1. I < n 2 / 3 : The contribution of terms in ([9| for this case is at most 

\2> £=Q s =0 

2. £ — n 2/>3 < s < £: The contribution of terms in ^ in this case is at most 

n/2 min(s+n 2 / 3 ,n— s) 

7%E E 

3. £ > n 2 / 3 and s < £—n 2 / 3 . Let m(£) = mm(n—£, £-n 2 / 3 ). Following Lemmajl] 
the probability of guessing wrong is at most exp(— 2n 1 / 6 /9). The contribution 
of this case in ^ is hence at most 

n m(i) , . / n m(f) \ 

t% E E( s + ex p(- 2 « 1/6 /9K)< 7% E E s + 0(1) - 

V2/ f=n 2/3 s=0 ^ / y V2/ f=n 2/3 s =0 y 

Thus, the contribution of sampling and estimation errors is o(n). As discussed 
in Section [2] small subarrays created in the recursion are directly sorted with 
a standard 0(n Inn) sorting algorithm, which yields an additional summand of 
o(nlnn) to the expected number of key comparisons. 

In conclusion, we expect a partitioning step to make |n+o(n) key comparisons, 
see Applying Q, we get E(C„) = 1.8nlnn + o(nlnn). □ 



6 Choosing Pivots From a Sample 

We consider here the variation of dual pivot quicksort in which the two pivots 
are chosen from a sample of 5 elements. We choose the second-largest and fourth- 
largest as pivots. (This is the pivot choice that is used in Yaroslavskiy's algorithm 
in the JRE7 implementation, see [13] for further discussion.) The probability that 
p and q, p < q, are chosen as pivots is exactly (s-m- £)/ (™) . Following Hennequin 
pp. 52-53], for partitioning costs E(P n ) = a ■ n + 0(1) we get 

1 20 

E ( G n) = 77 TT • a ' nlnn + °( n ) = T7{ ' a ■ n\nn + 0{n), (10) 

Hq — H 2 19 

where H n denotes the n-th harmonic number. In our setting, applying Theorem [3] 
yields expected partitioning costs of a ■ n + o(n). Using the same argument as in 
Section [21 the expected comparison count becomes 20/19 ■ a ■ nlnn + o(nlnn). 
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We will now investigate the effect on the expected number of key comparisons 
in Yaroslavskiy's resp. our partitioning method. The expected number of medium 
elements remains (n — 2)/3. For strategy y, we calculate 



v , 4 , , 1 ^ I- (2s + m) -s-m-t 34 
E(P») = r + o(n) + w £ ' = 2f» + o(»). 

V5/ s+£<n-3 



Applying (10), we get E(Cjj) = 1.704n In n + o(n Inn) key comparisons. (Note 
that Wild ei aZ. [T3] calculated this leading coefficient as well.) This is slightly 
better than "clever quicksort", which uses the median of a sample of three 
elements as a single pivot element and achieves 1.714nlnn + 0(n) expected key 
comparisons [5J. For our proposed partitioning method, assuming no estimation 
errors, we get 

4 2 , 37 

E ( P n) = 3« + o(n) + j— Y s-s-m-l= — n + o(n). 

\5J s+£<n-3 



Again using (10), we obtain E(Cjf) = 1.623n In n + o(n Inn), which is optimal as 
well. By applying random sampling as above, we get an algorithm that makes 
1.623n In n + o(n Inn) expected key comparisons, improving further on the leading 
coefficient compared to clever quicksort and Yaroslavskiy's algorithm. 



7 Swaps In Dual Pivot Quicksort 

After the discussion of key comparisons in dual pivot quicksort, we will now turn 
the focus on minimizing swap operations. Fortunately, we can build on work that 
has been done for the so-called Dutch National Flag problem. This problem is 
defined as follows: Given an array containing n elements, where each element 
is either red, blue or white, rearrange the elements using swaps such that they 
resemble the national flag of the Netherlands (red followed by white followed 
by blue). One immediately sees the connection between this problem and the 
partitioning problem in the context of dual pivot quicksort, for the partitioning 
problem deals with classifying elements into three different types called small, 
medium, and large, resp, and rearranging these elements such that small elements 
are followed by medium elements which are followed by large elements. We quickly 
review the work done on this problem, and then analyze the expected swap count 
of these algorithms when using them for dual pivot quicksort. Many results on 
the DNF cannot be applied directly, since the probability spaces differ. 



7.1 Algorithms for the Dutch National Flag Problem 

Here, we present three algorithms that solve the DNF problem. We state them in 
our notation, using small, medium, and large elements, resp., instead of red, white, 
and blue elements, resp. We assume that A is an array of length n containing 
small, medium, and large elements. 

The first algorithm is due to Dijkstra [3]. 
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Algorithm 1 (SwapA). 
procedure DijsktraSwap(A[l..n]) 



1 i := 1; j := n;k := n 

2 while i < j 

3 classify 

4 case small: swap(i,j); i++; 

5 case medium: j — ; 

6 case large: swap(j,k); j--; k — ; 

7 end while 



The second algorithm is due to Meyer [3] . It avoids swapping uninspected elements 
(that might be small and hence already present on a suitable position) as done 
in Algorithm [l] on line 4. 

Algorithm 2 (SwapB). 
procedure MeyerSwap(A[l..n]) 



1 i := 1; j := n;k := n 

2 while i < j 

3 classify A[j] 

4 case small: 

5 while ^4[i] is small 

6 i++; 

7 end while 

8 if i<j 

9 swap(i,j); i++; 

10 case medium: j — ; 

11 case large: swap(j,k); j — ; k — ; 

12 end while 



The next algorithm was explicitly stated by Chen in [T] , but has been discussed in 
[10 , too. Instead of classifying elements and moving them to their final position, 
it uses two stages. In the first stage, it moves all small elements to the left of 
the array. The second stage moves all large elements to the right, but does not 
inspect the array part that contains the small elements. 

Algorithm 3 (SwapC). 
procedure SwapC (A[\..n]) 



1 i := 1; j := n- 

2 while i < j 

3 while ^4[i] is small and i < j 

4 i++; 

5 end while 

6 while A[j] is not small and i < j 

7 j— ; 

8 end while 
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9 if i < j 

10 3wap(i,j); i++; j — ; 

11 end while 

12 if i < n 

13 j := i\ k := n; 

14 while j < k 

15 while A[j] is medium and j < k 

16 j-H-; 

17 end while 

18 while A\k] is large and j < k 

19 k ; 

20 end while 

21 if j < k 

22 swap(j,k); j++; k — ; 

23 end while 



7.2 Analysis of the Expected Swap Count 

Given a strategy, e.g., the optimal strategy from Section [5j we get a partitioning 
procedure for dual pivot quicksort by using Algorithm [T] or Algorithm[2] — replacing 
all classification steps according to the given strategy. In fact, Yaroslavskiy's 
algorithm uses strategy y together with Meyer's algorithm. 

The situation is a little bit different with Algorithm [3j for it uses that the 
smaller pivot is used for the first comparison. (This case was analyzed as strategy 
V in Section g) 

For now, we focus on calculating the expected swap count of these algorithms. 
Note that these algorithms have been analyzed in the setting of the Dutch 
national flag problem (see, e.g., [1]), but in a different probability space, in which 
each element independently chooses to be small, medium or large with probability 
1/3. 

Analysis of Algorithm^ Algorithm [T] has the following properties with respect 
to swap operations: 

(i) Each small element causes exactly one swap. 

(ii) Each large element causes exactly one swap. 

(iii) No medium element causes a swap operation. 

For the expected swap count Pj n for partitioning an array of length n — conditioning 
on the pivot choices as before — , we get the following formula^ 

Etfs,») = 7^ £ {s + t) = \n + 0{l). 

\2> s+ l< n 

7 When analyzing a partitioning method for dual pivot quicksort, we have to consider 
that pivots might have been swapped at the beginning. Furthermore, they have to be 
placed into their correct position at the end. In total, we get an additive summand 
of 5/2 to the expected number of swaps, c/. [12]. We omit this summand here. 
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Note that Q also holds for the swap case (c/. [32]). For the total expected 
number of swaps S n for sorting an array of length n using a dual pivot quicksort 
approach, we hence obtain E(5^) = 0.8nlnn + 0(n). 

Analysis of Algorithm^ Algorithm [2] has the following properties with respect 
to swap operations for s small and t large elements: 

(i) Each small element that resides in the array positions s + 1,.. . , n causes 
exactly one swap. 

(ii) Each large element causes exactly one swap. 

(iii) No medium element causes a swap. 

For the expected swap count Pj n for Algorithm [2J we get 

E(P|,J = y^y £ (s-(n- s)/n + i) = \n + 0(1), 

\2) s +e<n 

which yields a total expected swap count of E(S^) = 0.6nmn + 0(n). To no 
surprise, this is exactly the same number as the expected swap count calculated 
by Wild and Nebel [H] of Yaroslavskiy's algorithm. 

When we use the optimal strategy of Section [5j we can further improve this 
algorithm as follows: After the random sampling step, we make a guess whether 
s > £. If this is the case, we use Algorithm [2] and always compare to the smaller 
pivot first. Otherwise, we slightly change Algorithm [2] by letting j run from 1 
to k, swapping the small and large case, i.e., using the inner while loop when 
A[j] is large to ignore large elements. Conditioning on s and £, almost the same 
calculations as above show that this algorithm yields a total expected swap count 
of 0.45nkm + 0(n). 

Analysis of Algorithm^ Algorithm [3] has the following behavior with respect to 
swap operations for s small and I large elements. 

(i) In the first stage, each small element that resides in the array positions 
s + 1, . . . , n causes exactly one swap. 

(ii) In the second stage, each large element that resides in the array positions 
s + 1, . . . , n — I — 1 causes exactly one swap. 

(iii) No other element causes a swap. 

Given a random array containing s small, m medium, and I large elements, note 
that after the first stage the sequence of array elements at positions s + 1, . . . , n 
is still random, i.e., each such sequence is equally likely. (This is the same as 
the property that subarrays occurring in recursive steps of quicksort are fully 
random, cf. [HI Section 3.1].) 

For the expected swap count Pj n for Algorithm [3J we get 

E(P|. J = E ( s ' ( n " s )/ n + 1 ■ m /(" - s )) = il n + ( X )' 
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which gives E(S^) = 0.33..nlnn + 0(n) — and hence significantly decreases the 
number of swap operations compared to the previous algorithms. 

Again, we can use the optimal strategy of Section [5] by deciding if we use 
Algorithm [3] or its modified version, in which we first split into large and non-large 
elements after the random sampling step. 

8 Experiments 

We have implemented the methods presented in this paper in Gf+. Pseudocode 
for these methods can be found in Appendix [B] These algorithms have not been 
fine-tuned; this section is hence meant to provide initial results but does not 
replace a thorough experimental study. 

We have incorporated random sampling into the partitioning step of our 
method from Section[5]by comparing the first n' — max(n/100, 7) elements with 
p first. We switched to comparing with q first if the algorithm has seen more 
large than small element after n' steps. We have used Algorithm [l] as the swap 
strategy, because the exchange of the first comparison is simple with it. 

Our experiments were carried out on an Intel Xeon E5645 at 2.4 GHz with 
48 GB Ram running Ubuntu 12.04 with kernel version 3.2.0. The source code 
was compiled with gcc using the -02 optimization flag. 

In Section [8. 1[ we will experimentally evaluate the comparison and swap count 
of the algorithms considered in this paper. In Section |8.2[ we will focus on the 
actual running times needed to sort a given input. The charts of our experiments 
can be found at the end of this paper. 

8.1 Comparison and Swap Count 

We first have a look at the comparison and swap count needed to sort a random 
input of up to 50 000 000 integers. We did not switch to a different sorting 
algorithm, e.g., insertion sort, to sort short subarrays. 

Figure [2] shows the results of our experiments for algorithms that choose the 
pivots directly. We see that the linear term in the expected comparison count 
has a big influence on the number of comparisons. For Yaroslavskiy's algorithm 
this linear term is — 2.46n, as calculated by the authors of |12) . 

The results confirm our theoretical studies and show that our algorithm 
beats all other algorithms with respect to the comparison count, although we 
incorporated random sampling directly into the partitioning step. We also see that 
the modified version of Sedgewick's algorithm beats Yaroslavskiy's algorithm. On 
the other hand, Sedgewick's original algorithm makes the most key comparisons 
and is even worse than standard quicksort, as described in Section [OJ 

Figure [5] shows the same experiment for the algorithms that choose the pivots 
from a small sample. This plot confirms the theoretical results from Section [6] 

Figure [6] shows the same experiment for the swap count. These results confirm 
the theoretical study conducted in Section [7| We see that Sedgewick's strategy 
and Dijkstra's strategy make the most swaps. Yaroslavskiy's algorithm is clearly 
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better and shows exactly the behavior calculated for a strategy that uses Meyer's 
algorithm. We see that the modified version of Meyer's algorithm that makes 
a small random sampling step and then decides which version it uses is better. 
The best swap strategy — as calculated in Section [7] — is Algorithm [3j It almost 
matches the swap count of standard quicksort. However, in our initial experiments, 
the running time of this variant was not competitive to algorithms that used 
Algorithm [l] or Algorithm [2] 

8.2 Running Times 

We now consider the running times of our algorithms to sort a given input. To 
measure running times, all algorithms used a small sample to choose the pivots 
from. Clever quicksort uses the median of a sample of three elements. Yaroslavskiy 
and our algorithm use the second- and fourth-largest elements from a sample of 
five elements. We sorted subarrays of size at most 16 directly using insertion sort. 

With respect to running times, we see that Yaroslavskiy 's algorithm is superior 
to the other algorithms when sorting random permutations of {1, . . . , n}. Our 
method is about 5% slower, see Figure [7j When sorting strings (and key compar- 
isons become more expensive), our algorithm can actually beat Yaroslavskiy 's 
algorithm, see Figure [7j However, the difference is only about 2%, see Figure [8] 

9 Conclusion and Open Questions 

We have studied dual pivot quicksort algorithms in a unified way and found 
an optimal partitioning method w.r.t. minimizing the expected number of key 
comparisons. For the problem of minimizing swaps we have 

— pointed out that comparisons and swaps can be treated independently, and 

— brought back into the discussion the obvious connection between dual pivot 
quicksort and the DNF problem. 

While we are now in a situation in which we can present algorithms that both 
lower the expected comparison count and the expected swap count compared 
to Yaroslavskiy's algorithm, the most urgent question is to find an actual im- 
plementation that translates these theoretical properties into better running 
times. 
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Fig. 3. Visualization of the decision process when inspecting element a;. Applying 
strategy O from left to right uses that of the remaining elements three are small 
and two are large, so it decides that dj should be compared with p first. Applying 
strategy H from right to left uses that of the inspected elements three were small 
but only one was large, so it decides to compare a, with p first, too. Note that 
the strategies would differ if one small element would be a medium element. 



A Proof of Theorem [2] 

As mentioned in Section [3j the key idea to analyze the difference between strategy 
O and strategy % is considering the case that on an input (a\, . . . , a n ) strategy 
Ti. inspects the elements in the order a%, . . . , a n , while strategy O inspects them 
in the order a n , . . . , ai- For a fixed element a iy % uses knowledge on the exact 
number of small resp. large elements in ai, . . . , aj_i, while strategy % uses 
knowledge on the exact number of small resp. large elements in oi, . . . , cij. Thus, 
there is only a small difference in the decision process of these two strategies. 
In fact, they can only differ when the number of small and large elements in 
Oi, . . . , a, is exactly the same. See Figure [3] for a further explanation. 

We can hence calculate the difference in the value of the additional cost 
term of strategy % and O just by analyzing the expected number of positions 
2i, i £ {1, . . . , n/2}, in which the number of small elements equals the number of 
large elements for a random input (so-called zero-crossings). Medium elements 
have no influence on this value, because both strategies need two comparisons 
for classification. 

By symmetry, we assume that the number of small elements is at most as 
large as the number of large elements. First, we omit medium elements to simplify 
calculations, i.e., we assume that the number of small and large elements is n. 
Let Z n be the random variable that counts the number of zero-crossings for an 
input of n elements. We calculate: 



E(Z„) = Pr(there is a zero-crossing at position 2i) 

l<i<n/2 

= — \^ \^ Pr(there is a zero-crossing at position 2i I s small elements) 

l<i<n/2 i<s<n/2 

(2i\ _ (n-2i\ 

— — \ s-i J 

l<i<ra/2 i<s<n/2 K s ' 
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By using the identity ( 2 /) = {2 2l r(i + 0.5)/(y/nr(i + 1)), which gives rise to 
( 2j ) = 0(2 2l /y/i), and substituting j = n/2 — s, we continue by 



/ -, \ n / 2 n2t n / 2 (n-2i\ 

v 7 2=1 v S—S \S/ 



n/2 n/2 ^_ 2i) ^ _ ( n _i_ s+ l) . s . . . . . 



— 1 v s— « v 7 



/ 2 o 2l «/ 2 4 ( n _ 2i ). . . . .( n /2_i +i7 -+i) . (n/2-j)- . . . -(n/2-j-i+l) 



S n ■ . . . ■ (n/2+i+l) 



™/ 2 2i "/ 2 -« 



= ( M^f_ (n/2+j)- . . . .(n/2-i+j+l) ■ (n/2-j)- . . . jn/2-j-i+l) 

(n+2j)- . . . -(n+2j-2(i-l)) • (n-2j> ■ • ■ -(n-2j-2(i-l)) 



n/2 n/2-i 



1\ ^ n + l t=f (n+2j-2fc)(n-2j-2fc) 



° y ' ^ ^(n-2i+l) ^ i X o («-2fe+l)(n-2fe) 
We now continue by bounding the latter product. 

n/2 n/2— i j— 1 



2j 



n- 2fc 



l\ " /2 n+l n/2-»»-l / / 2 -\ 2 N 



I \ "/ 2 ,1 n/2-i / /o , N 2\ 

= o[ l -W - n+1 — y (i-f^ 

U -2i + l) ^ I \n 



O 



O 



IW t i- 2 -i 

n) ^ Vi(n - 2i + 1) (/ (* ^) ) dt+1 



Using a standard computer algebra system for the integral, we obtain 



2G 



Since r(i + l)/r(i + 3/2) = @(l/yi), we continue by calculating 



E(Z n )<0(l). X " +L 



i(n -2i + l) 

l<i<n/2 v ' 



<o(d ( E \+ E ^n)=°( los ^ 

\l<i<n/4 ra/4+l<i<n/2 / 

Now, we consider the case that the input contains medium elements. Fix the 
number of small elements and the number of large elements. Since medium 
elements have no influence on the value of the additional cost term, we have that 
E(Z„ | s small, m medium, £ large elements, s + m + I = n) equals E(Z s+ £ | 
s small, £ large elements), i.e., equals the expected value of zero-crossings for a 
smaller input containing s + £ elements. 

We can now simply calculate the expected number of zero-crossing for an 
arbitrary input as follows: 



\21 s+t<n 
\2J s +l<n 

4E 0(log(s+£)) = 0(logn), 



\2J s +e<n 

which concludes the proof of Theorem [2j □ 
We will now show that such a small error term does not increase the expected 
comparison count by more than a linear term. 

Lemma 2. Let neN, and let a be a constant with a > 0. 

7/E(P n ) = a-n + 0{\ogn), thenE(C n ) = 6/5 • a ■ n\nn + 0(n). 

Proof. We show by induction on the input size that the contribution of the 
O(logn) terms sum up to at most 0(n) for the total expected comparison count. 
To make this precise, let c be a constant such that the error term is at most chin. 
Let E(A n ) denote the sum of the error terms in the expected comparison count. 
We will show that 

E(A n ) < C-n-Dlnn. (11) 
Let D > c/5. For the base case, let no £ N and set C such that E(A n ) < 



C ■ n — D In n for all n < tiq. As the induction hypothesis, assume that ( 11 ) holds 
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for all n' < n. For the induction step, we calculate: 
E(A*) = -j4 E E(A n | S ,m,£) 

<C -n+ "4- (chin - 13 • (Ins + him + lnf)) 

\2/ s + m +£ =n 

= (7-n + clnn — -j—r (-D • In s) 

V2/ s+m _|_£ =n 

= C-n + clnn (D-lns) 

71 Ks<n 



Now we use that Yll= a /(*) — / f{ x )& x and obtain 

E(A„) < C • n + clnn ■ (nlnn - n + 1) 

n 

= C ■ n + clnn - 6D • (Inn - 1 + 1/n). 

Finally, we calculate 

clnn — 6-D(lnn — 1 + 1/n) < —D Inn 

clnn < 5£>lnn- 6D + 6D/n 

Thus, the additional O(logn) terms sum up to O(n) in the total expected 
comparison count. The leading term in the lemma follows directly from Q. □ 



B Dual Pivot Quicksort Algorithms 

A dual pivot quicksort method has the following general form: 

Algorithm 4 (Dual Pivot Quicksort). 

procedure DQS(A, left, right) 

(1) if right - left> 1 then 

(2) if A[left] > A[right] then 

(3) swap A[left] and A[right] 

(4) P := A[left}; 

(5) q:=A[right}; 

(6) partition(A, p, q, left, right, pos p ,pos g ) 

(7) DQS{k, left, pos p - 1) 

(8) DQS(A, pos p + 1, pos 9 - 1) 

(9) DQS{k, pos (? + 1, right) 



28 



(10) end if 



To get an actual algorithm we have to implement a partition function that 
partitions the input as depicted in Figure [T] A partition procedure in this paper 
has two output variables pos p and pos q that are used to return the positions of 
the two pivots in the partitioned array. 

B.l Partitioning Methods Based on Sedgewick's Algorithm 

Algorithm [5] shows Sedgewick's partitioning method as studied in [5]. 

Algorithm 5 (Sedgewick's Partitioning Method), 
procedure S-Partition(A, p, q, left, right, pos p , pos q ) 



1 i := ii := left, j := j x := right; 

2 while true 

3 i:=i + l; 

4 while A[i] < q; 

5 if i > j then break outer while end if 

6 if A[i] < p then A[ii] := A{±\; ii := ii + 1; ^4[i] := A[i t ] end if 

7 i : i • 1: 

8 end while 

9 J := J 1; 

10 while A[j] > p; 

11 if A[j]>q tbeuA[i 1 ]: = A[3];i 1 :=3 1 -l]A\j]:=A\j 1 ]endit 

12 if i > j then break outer while end if 

13 j := j 1; 

14 end while 

15 ^[i!]:^];^]:^]; 

16 ii := ii + 1; jj := j x - 1; 

17 Alil^A^A^^A^]; 

18 end while 

19 A[±x] —pjALh] : = 05 

20 pos p := U; pos q ;= j x ; 



Algorithm [6] shows an implementation of the modified partitioning strategy from 
Section PI 

Algorithm 6 (Sedgewick's Partitioning Method, modified), 
procedure S2-Partition{A, p, q, left, right, pos p , pos q ) 



1 i := ii := left; j := j x := right; 

2 while true 

3 i:=i + l; 

4 while true 

5 if i > j then break outer while end if 

6 if A[±] < p then A[ii] := A[i]; ii := ii + 1; A{i] := A[±j] 
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7 else if A[i] > q then break inner while end if 

8 i:=i + l; 

9 end while 

10 j:=j-i; 

11 while true 

12 it A[j]>q tbeuA[i 1 ]:=A\i];i 1 :=3 1 -l]A[3]:=A\j 1 ] 

13 else if A[i] > q then break inner while end if 

14 if i > j then break outer while end if 

15 j:=j-l; 

16 end while 

17 A^^A^A^-^Ali]; 

18 ii := ii + 1; jj := j a - 1; 

19 A^^A^A^^A^]; 

20 end while 

21 A[±t] —pjAtji] := q; 

22 pos p := ±i;pos q := j t ; 



B.2 Yaroslavskiy's Partitioning Method 

Algorithm [7] shows the partition method of Yaroslavskiy's algorithm as studied 
in [12]. 

Algorithm 7 (Yaroslavskiy's Partitioning Method), 
procedure Y -Partition^ A, p, q, left, right, pos p , pos q ) 



1 1 := left+ l;g := right- l;k := 1 

2 while k < g 

3 if 4[k] < p 

4 swap yl[k] and ^[l] 

5 1 := 1 + 1 

6 else 

7 if A[k] > g 

8 while A[g] > g and k < g do g := g — 1 end while 

9 swap A[k] and A[g] 

10 g:=g-l 

11 if A[ls\ < p 

12 swap A[k] and A[l] 

13 1:=1 + 1 

14 end if 

15 end if 

16 end if 

17 k:=k + l 

18 end while 

19 swap A[left] and A[l - 1] 

20 swap A[right] and A[g + 1] 

21 pos := 1 - 1; pos := g + 1; 
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B.3 Algorithms For the New Partitioning Method 

Algorithm [8] presents the partitioning algorithm that always compares to the 
smaller pivot first. 

Algorithm 8 (Simple Partitioning Method (smaller pivot first)), 
procedure SimplePartitionSmall{A, p, q, left, right, pos p , pos q ) 



1 1 := left+ l;g := right- l;k := 1 

2 while k < g 

3 if A[ls\ < p 

4 swap 4[k] and .A[l] 

5 1 := 1 + 1 

6 k := k + 1 

7 else 

8 if A[k] < q 

9 k := k + 1 

10 else 

11 swap A[k] and A[g] 

12 g:=g-l 

13 end if 

14 end if 

15 end while 

16 swap A[left] and A[l - 1] 

17 swap A[right] and vl[g + 1] 

18 pos p := 1 - 1; pos q := g + 1 



Algorithm [9] presents the partitioning algorithm that always compares to the 
larger pivot first. 

Algorithm 9 (Simple Partitioning Method (larger pivot first)), 
procedure SimplePartitionLarge(A, p, q, left, right, pos p , pos q ) 



1 1 := left+ l;g := right - l;k := 1 

2 while k < g 

3 if A[)t] > q 

4 swap A\k] and ^4[g] 

5 g := g - 1 

7 else 

8 if A[k] < p 

11 swap A[k] and A[l] 

12 1:=1 + 1 

13 end if 

13 k:=k+l 

14 end if 
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15 end while 

16 swap A[left] and A[l - 1] 

17 swap A[right] and ^4[g + 1] 

18 pos p := 1 — 1; pos q := g + 1 
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Fig. 4. Comparison count (scaled by nlnn) needed to sort a random input of up 
to n — 50 000 000 integers for algorithms that choose the pivots directly. 
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Fig. 5. Comparison count (scaled by nlnn) needed to sort a random input of up 
to n = 50 000 000 integers for algorithms that choose the pivots from a sample. 
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Fig. 6. Swap count (scaled by nlnn) needed to sort a random input of up to 
n = 50 000 000 integers for algorithms that choose the pivots directly. 
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Fig. 7. Running time (in milliseconds) needed to sort a random permutation of 
{1, . . . , n}. Running times were averaged over 1000 trials for each n. 
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Fig. 8. Running time (in milliseconds) needed to sort a random set of n article 
headers of the English Wikipedia. Running times were averaged over 1000 trials 
for each n. 
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